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Abstract — This paper first introduces a refined version of the 
Azuma-Hoeffding inequality for discrete-parameter martingales 
with uniformly bounded jumps. The refined inequality is used to 
revisit the large deviations analysis of binary hypothesis testing. 

Index Terms — Fisher information, hypothesis testing, large 
deviations, relative entropy. 



I. Introduction 

An analysis of binary hypothesis testing from an 
information-theoretic point of view, and a derivation of its 
related error exponents in analogy to optimum channel codes 
was provided in 0- A nice exposition of the subject is also 
provided in (6] Chapter 11] where the exact error exponents 
for the large deviation analysis of binary hypothesis testing 
are provided in terms of relative entropies. 

The Azuma-Hoeffding inequality is by now a well-known 
methodology that has been often used to prove concentration 
of measure phenomena. It is due to Hoeffding [9] who proved 
it first for a sum of independent and bounded RVs, and Azuma 
(2) who later extended it to bounded-difference martingales. 
For a nice exposition of the martingale approach, used for 
establishing concentration inequalities, the reader is referred 
to e.g. and ifTTl . The starting point of this work is an 
introduction of a known concentration inequality for discrete- 
parameter martingales with uniformly bounded jumps, which 
forms a refined version of the Azuma-Hoeffding inequality. It 
is then used to study some of its information-theoretic implica- 
tions in the context of binary hypothesis testing. Specifically, 
the tightness of this concentration inequality is studied via 
a large deviations analysis for binary hypothesis testing, and 
the demonstration of its improved tightness over the Azuma- 
Hoeffding inequality is revisited in this context. Some links 
of the derived lower bounds on the error exponents to some 
information measures (e.g., the relative entropy and Fisher 
information) are obtained along the way. 

This paper is structured as follows: Section HI] introduces 
briefly some preliminary material related to martingales and 
Azuma' s inequality, and then it considers a refined version of 
Azuma's inequality. This refined inequality is followed by a 
study of some of its relation to the martingale central limit 
theorem. Section [III] considers the relation of the Azuma's 
inequality and the refined version of this inequality (which 
was introduced in Section [TTJ> to large, moderate and small 
deviations analysis of binary hypothesis testing. Section [IV] 



concludes the paper, followed by some proofs and comple- 
mentary details that are relegated to the appendices. 

II. Preliminaries and a New Concentration 
Inequality 

In the following, we present briefly essential background 
on the martingale approach that is used in this paper to 
derive concentration inequalities. A refined version of Azuma's 
inequality is then introduced. This concentration inequality is 
applied in the next section for revising the large deviations 
analysis of binary hypothesis testing. 

A. Doob \s Martingales 

This sub-section provides a short background on martingales 
to set definitions and notation. For a more thorough study of 
martingales, the reader it referred to, e.g., Q. 

Definition 1: [Doob's Martingale] Let (fl, J 7 , P) be a prob- 
ability space. A Doob's martingale sequence is a sequence 
Xq, Xi, ... of random variables (RVs) and corresponding sub 
er-algebras Fq : J-\, . . . (also denoted by {Xi,^}) that satisfy 
the following conditions: 

1) Xi £ L 1 (i7, J^jP) for every i, i.e., each Xi is defined 
on the same sample space ft, it is measurable with 
respect to the corresponding tr-algebra Ti (i.e., Xi is 
.^-measurable) and E[|X,|] = J Q \X,(uj)\dF(uj) < oo. 

2) J-q C T\ C . . . (where this sequence of er-algebras is 
called a filtration). 

3) Xi — E[Xj+i|.Fj] holds almost surely (a.s.) for every i. 
For preliminary material on the construction of discrete-time 

martingales, see Appendix lAl (which is relevant to the analysis 
in Section UTTb . 

B. Azuma 's Inequality 

Azuma's inequality^ forms a useful concentration inequal- 
ity for bounded-difference martingales [2j. In the following, 
this inequality is introduced. The reader is referred to, e.g., 
0] Chapter 11], and iTPTl for surveys on concentration 
inequalities for (sub/ super) martingales. 

Theorem 1: [Azuma's inequality] Let {Xk,J-k}kL be a 
discrete-parameter real-valued martingale sequence such that 

'Azuma's inequality is also known as the Azuma-Hoeffding inequality. 
Since this inequality is referred several times in this paper, it will be named 
from this point as Azuma's inequality for the sake of brevity. 
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for every k £ N, the condition \X k — X k _i\ < d k holds a.s. 
for some non-negative constants {c^}^-, . Then 

P(|X„-X |>r)<2exp 



k=V 
2 



Vr > 0. (1) 



The concentration inequality stated in Theorem Q] was 
proved in J51 for independent bounded random variables, 
followed by a discussion on sums of dependent random 
variables; this inequality was later derived in [2| for bounded- 
difference martingales. For a proof of Theorem[T]see, e.g., (5) 
and H Chapter 2.4]. 

C. A Refined Version of Azuma's Inequality 

Theorem 2: Let {X k , .Ffe}j£l ^ e a discrete-parameter real- 
valued martingale. Assume that, for some constants d, a > 0, 
the following two requirements are satisfied a.s. 

\X k -X k ^\ <d, 

Var(X fc |J- fc _!) = E[(X k - X k ^f \ F k -i] < a 2 
for every k € {1, . . . , n}. Then, for every a > 0, 



X Q \ > an) < 2exp ( -nD 



5 + 7 



1 



7 



1 



(2) 



7 d 2 ' 



S= d 



(3) 



n\x n 

where 
and 

D(p||g)^pm(^)+(l-p)ln(^), Vp,ge[0,l] (4) 

is the divergence (a.k.a. relative entropy or Kullback-Leibler 
distance) between the two probability distributions (p, 1 — p) 
and (q, 1 — g). If 6 > 1, then the probability on the left-hand 
side of d2J is equal to zero. 

Proof: The idea of the proof of Theorem [2] is essentially 
similar to the proof of JU Corollary 2.4.7]. The full proof is 
provided in lfl2l Section III]. ■ 
Proposition 1: Let {X k , J^}^_ be a discrete-parameter 
real-valued martingale. Then, for every a > 0, 



\\X n - X Q \ > ay/n~) < 2exp 



J 2 
"27 



i + o( 



(5) 



Proof: This inequality follows from Theorem [2] (see [12, 
Appendix G]). ■ 

III. Binary Hypothesis Testing 

Binary hypothesis testing for finite alphabet models was 
analyzed via the method of types, e.g., in 16] Chapter 11] and 
0. It is assumed that the data sequence is of a fixed length 
(n), and one wishes to make the optimal decision (based on the 
Neyman-Pearson ratio test) based on the received sequence. 

Let the RVs Xx,X%.... be i.i.d. ~ Q, and consider two 
hypotheses: 

. H 1 :Q = P 1 . 
. H 2 :Q = P 2 . 



For the simplicity of the analysis, let us assume that the RVs 
are discrete, and take their values on a finite alphabet X where 
Pi(x),P 2 (x) > for every x € X. 
In the following, let 



L{X X ,. 



, X r , 



In 



P" {Xi , . . . , X n 
P£ (Xi , • ■ • , x„ 



i=l 



PijXj) 
P»(Xi) 



designate the log-likelihood ratio. By the strong law of large 
numbers (SLLN), if hypothesis Hi is true, then a.s. 

L(X x ,...,X n ) 



lim 



and otherwise, if hypothesis H 2 is true, then a.s. 
L( X u ...,X n ) = 



lim 



(6) 



(7) 



where the above assumptions on the probability mass functions 
Pi and P 2 imply that the relative entropies, D(Pi\\P 2 ) and 
D(P 2 \\Pi), are both finite. Consider the case where for some 
fixed constants A, A S M where 

-D(P 2 \\Pi) < A < A < D(Pi\\P 2 ) 

one decides on hypothesis Hi if 

L(Xi,...,X n ) > nX 

and on hypothesis H 2 if 

L(Xi,...,X n ) < nX. 

Note that if A = A = A then a decision on the two hypotheses 
is based on comparing the normalized log-likelihood ratio 
(w.r.t. ri) to a single threshold (A), and deciding on hypothesis 
Hi or H 2 if this normalized log-likelihood ratio is, respec- 
tively, above or below A. If A < A then one decides on Hi 
or H 2 if the normalized log-likelihood ratio is, respectively, 
above the upper threshold A or below the lower threshold A. 
Otherwise, if the normalized log-likelihood ratio is between 
the upper and lower thresholds, then an erasure is declared 
and no decision is taken in this case. 
Let 

e#> ±P?(L(X u ...,X n )<nX) (8) 

(9) 



«1 2 > ±I?(L(X 1 ,...,X n ) <nX 



and 



±P?(L(Xi,...,X n )>nX) 
P ( n ] =P 2 n (L(X 1 ,...,X„) >nX 



(10) 
(ID 



then a4 and are the probabilities of either making an 

error or declaring an erasure under, respectively, hypotheses 

(2) (2) 

Hi and H 2 ; similarly a n and /?„ are the probabilities of 
making an error under hypotheses Hi and H 2 , respectively. 

Let 7Ti,7T2 G (0, 1) denote the a-priori probabilities of the 
hypotheses Hi and H 2 , respectively, so 

^ 1 „ ) =^«l 1) +^2/3i 1) (12) 
is the probability of having either an error or an erasure, and 

P$ ^ TTia^ + 7T 2 /3^ (13) 

is the probability of error. 
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A. Exact Exponents 



GO 



When we let n tend to infinity, the exact exponents of ai 
and (j = 1,2) are derived via Cramer's theorem. The 
resulting exponents form a straightforward generalization of, 
e.g., ||8] Theorem 3.4.3] and [10 Theorem 6.4] that addresses 
the case where the decision is made based on a single threshold 
of the log-likelihood ratio. In this particular case where A 
A = A, the option of erasures does not exist, and P e 
-Pe.n — Pe.n is the error probability. 

In the considered general case with erasures, let 

Aj = —A, A2 = —A 



e,n 



of otn \ otn \ /% and /3„ are given by 



then Cramer's theorem on R yields that the exact exponents 

n b; 

/(Ai 



lim - 

n— >oo 



In OLn 



lim 

n— >oo 



In a 



(2) 



7(A 2 



ln/3i 1} 

lim — = 7(A 2 ) - A 2 



lim 

n— ¥00 



ln/3i 2) 



/(Ai)-Aj 



where the rate function I is given by 



I(r) = sup(tr - H(t)) 

teR 



and 



Vie 



(14) 
(15) 
(16) 
(17) 

(18) 
(19) 



The rate function I is convex, lower semi-continuous (l.s.c.) 
and non-negative (see, e.g., [8] and ifTUl ). Note that 

H(t) = (t-l)D t (P 2 \\P 1 ) 

where Dt(P\\Q) designates Reyni's information divergence of 
order t, and I in (fT8l is the Fenchel-Legendre transform of H 
(see, e.g., IS] Definition 2.2.2]). 

From (fT2l- ( fTTI i, the exact exponents of P e ^n and Pj,™ are 
equal to 



lim 

n— ^00 77 



iin{/(A 1 ),J(A 2 )-A 2 } (20) 



and 



lim 

n— ¥00 



111 e,n 



iin{/(A 2 ),J(Ai)-Ai}. 



(21) 



For the case where the decision is based on a single 
threshold for the log-likelihood ratio (i.e., Ai = A 2 — A), 
then P 6) n = P e ,n — Pe,m and its error exponent is equal to 



lim - 

n— >oo 



InP 



im{/(A),/(A)-A} 



(22) 



which coincides with the error exponent in [8 Theorem 3.4.3] 
(or iTTOl Theorem 6.4]). The optimal threshold for obtaining 
the best error exponent of the error probability P e „ is equal 



to zero (i.e., A = 0); in this case, the exact error exponent is 
equal to 

1(0) = - nm> In ( Pii^P^A = C(P ll P 2 ) (23) 

which is the Chernoff information of the probability measures 
Pi and P 2 (see [6, Eq. (11.239)]), and it is symmetric (i.e., 
C{Pi,P 2 ) = C(P 2 ,Pi)). Note that, from d, 1(0) = 
sup tgR (— H(t)) = — inftgR (//"(£)) ; the minimization in (l23l 
over the interval [0, 1] (instead of taking the infimum of H 
over E) is due to the fact that H(0) = H(l) = and the 
function H in ( fT9l is convex, so it is enough to restrict the 
infimum of H to the closed interval [0, 1] for which it turns 
to be a minimum. 



B. Lower Bound on the Exponents via Theorem [2] 

In the following, the tightness of Theorem [2] is examined 
by using it for the derivation of lower bounds on the error 
exponent and the exponent of the event of having either an 
error or an erasure. These results will be compared in the 
next sub-section to the exact exponents from the previous sub- 
section. 

We first derive a lower bound on the exponent of a\}\ 
Under hypothesis Hi, let us construct the martingale sequence 
{Uk, -Pfc}fc = o wnere -Po C Ty C . . .T n is the filtration 



Po 
and 



{0,r!}, F k = ( j(X 1 ,.,.,X k ), Vfee{l,...,n} 



U k =E Pr [L(X 1 ,...,X n )\T k ]. 
For every k g {0, . . . , n} 

Pipq 



(24) 



U k = 



k 



i=l 
k 



,i=l 

Pi(X t 



Pi(Xi 

Pi(Xj 
Pi(Xi 



P 2 (X t ) I 

Tl 

E 



Tk 



In 



i=k+l 

+ (n-k)D(P 1 \\P 2 ) 



Pi(Xj) 

P2(Xi) 



In particular 



U = nD(P 1 \\P 2 ), 



■ ■ , X n ) 



and, for every k G {1, . . . , n}, 

Pi(X k ) 



Let 



Uk - t4-i 



di = max 



In 



In 



P2(X k ) 

P l( x ) 
P2(X) 



-P>(Pl||P 2 )- 

D(Pi\\P2) 



(25) 
(26) 

(27) 
(28) 



so c?i < 00 since by assumption the alphabet set X is finite, 
and Pi(x), P 2 (x) > for every x € X. From (|27]) and d28l 

\U k -U k ^ 1 \<d 1 
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holds a.s. for every k € {1, . . . , n}, and 

Ep r [((y fe -[/ fe -i) 2 |j- fe -i] 



= E 




P 2 (* fe ) 



Pi (a) In 



P 2 (a) 



Let 



£ 1)1 =£>(P 1 ||P 2 )-A, 

£l, 2 = £>(Pl||P 2 )- A, 



e 2 ,i = r>(P 2 ||p 1 ) + A 

£2,2 =D(P 2 \\Pl)+X 



(29) 

(30) 
(31) 



The probability of making an erroneous decision on hypothesis 
H 2 or declaring an erasure under the hypothesis Hi is equal 
to a n , and from Theorem [2] 

a^±P?(L{Xi 

(a) 



(b) 

< exp 



-nD 



\ 1- 



P?(U n -U < -ei,i ») 
#i,i + 71 



■7i 



7i 



1 + 71- 



(32) 
(33) 



where equality (a) follows from d25l , d26l and d30b , and 
inequality (b) follows from Theorem [2] with 

A °1 c A fc l,l /a/f s, 

7i = u , °i,i = -j-- (34) 
aj di 

Note that if e M > di then it follows from ([27j and d28j that 
a n is zero; in this case <5i.i > 1, so the divergence in (l33l 
is infinity and the upper bound is also equal to zero. Hence, 
it is assumed without loss of generality that 5i t i € [0, 1]. 

Similarly to d24l >. under hypothesis P 2 , let us define the 
martingale sequence {£/&, Pk}k=o w ' tn tne same filtration and 



U k =Ep«[L(X 1 ,...,X n )\F k ], 
For every k g {0, . . . , n} 



Vfc € {0, ...,n}. (35) 



(n-fc)£>(P 2 ||Pi) 



fi 7 ~P 2 (Xi) 
and in particular 

C/ = -n J D(P 2 ||Pi), U n = L(Xi, . . . ,X„ 
For every k s {1, . . . , n}, 

Pi(X fc ) 



Let 



do = max 



In- 



P 2 (^fc) 



£>(P 2 ||Pi). 



Pi(x) V 211 lj 



(36) 



(37) 



(38) 



then, the jumps of the latter martingale sequence are uniformly 
bounded by d 2 and, similarly to ( 1291 , for every k € {1, . . . , n} 



Ep 2 "[(C4-[/ fc -i) 2 |P fe -i] 



P 1 (X) 



-D(P 2 \\Px) 



(39) 



Hence, it follows from Theorem [2] that 

0<P ±P?(L(X 1 ,...,X n )>n\) 
= P 2 n (U n ~U >e 2il n) 



< exp 



h,i + 72 



'72 



"2 



■72 



(40) 
(41) 



where the equality in d40l > holds due to d36t and d30l >. and (|4TT> 
follows from Theorem |2] with 



A ^2 c A £2,1 

72 = -a . «a,i = -r- 



(42) 



and d 2 , cr 2 are introduced, respectively, in (l38l and ( |39l . 

From ( IT2l . ( [331 and (|4TT ). the exponent of the probability of 
either having an error or an erasure is lower bounded by 



\nP$ . „/<5 4 ,i 
lim > mm L> 

n— >oo n 1=1,2 V 1 + 7 



7» 



1 + 7 



(43) 



Similarly to the above analysis, one gets from ( fT3l and (1311 
that the error exponent is lower bounded by 



lim 



lnP (2) 

111 r en 



> min D 

i=l,2 



l+7i 



l + 7< 



where 



X A £l,2 

01,2 = -T-, 

di 



(^2,2 = 



£2,2 



(44) 

(45) 
A) then 



For the case of a single threshold (i.e., A = A 
(l43l and d44l i coincide, and one obtains that the error exponent 
satisfies 



lim - 

n— >oo 



InP 



> min D ( 

2 = 1,2 



(Si - 


K7i 


72 N 


In 


-7» 


1+7, J 



(46) 



where is the common value of <5j,i and £j, 2 (for i = 1,2). 
In this special case, the zero threshold is optimal (see, e.g., 
IS p. 93]), which then yields that d46l is satisfied with 



Si = 



D{Pi\\P 2 ) 



S 2 = 



D{P 2 \\Pi) 



(47) 



d\ d 2 
with di and d 2 from d28T > and d38l . respectively. The right-hand 
side of d46b forms a lower bound on Chernoff information 
which is the exact error exponent for this special case. 

C. Comparison of the Lower Bounds on the Exponents with 
those that Follow from Azuma 's Inequality 

The lower bounds on the error exponent and the exponent 
of the probability of having either errors or erasures, that 
were derived in the previous sub-section via Theorem [2] are 
compared in the following to the loosened lower bounds on 
these exponents that follow from Azuma's inequality. 

We first obtain upper bounds on a„ , a n 2 ^ , fin and ftn^ via 
Azuma's inequality, and then use them to derive lower bounds 
on the exponents of P e ,n and P e ,n . 



From d27l) . d28T i. (l32l . (l34l . and Azuma's inequality 



o 



I 1 ) < 



exp 



Shn 



(48) 



and, similarly, from d37l l, d38l l, d40l , d42b . and Azuma's in- 
equality 



pW<exp(-^\ (49) 
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From (O, ( fTTT ). ( TJTT ), (|45j and Azuma's inequality 



a< 2 > < 



exp 



(3^ <exp - 



(50) 



(51) 



Therefore, it follows from | fI2t . (fT3l > and (|48l)-(l5l1) that the 
resulting lower bounds on the exponents of and Pj,n are 



lim 



n ~ i=i,2 2 ' J 



as compared to d43l and d44l > which give, for j = 1,2, 



lim - 

n— >oo 



InP 



e,n ^ . ,-. (Si j + 7« 

> mm D 



1 + 7* 



1 + 7* 



(52) 



(53) 



For the specific case of a zero threshold, the lower bound on 
the error exponent which follows from Azuma's inequality is 
given by 



In P (i) S 2 
lim > mm — 

n— yoo 71 i— 1,2 2 



(54) 



with the values of 5i and 6% in (|47j. 

The lower bounds on the exponents in d52l and d53l are 
compared in the following. Note that the lower bounds in d52l 
are loosened as compared to those in d53l since they follow, 
respectively, from Azuma's inequality and its improvement in 
Theorem [2] 

The divergence in the exponent of d53l l is equal to 



ln(l -Sij) 




1 + 7* 

(l-Sij) ln(l -&,,■) 



7i 



lie [-1,0] 



(55) 



(56) 



%r - u > 



2 6 

where at it = —1, the left-hand side is defined to be zero (it 
is the limit of this function when u — > — 1 from above). 

Proof: The proof follows by elementary calculus. ■ 
Since Sij € [0, 1], then ( f55l > and Lemma Q] imply that 

s 2 ■ <y? . 

> - : -. (57) 



D 



+ 7* 



1 



7i 



7i 



1 



7i 



2 7 * 6 7 2 (l+7*)' 



Hence, by comparing ( 1521 with the combination of ( |53l > and 
(l57l >. then it follows that (up to a second-order approximation) 
the lower bounds on the exponents that were derived via 
Theorem |2] are improved by at least a factor of (max7,) 
as compared to those that follow from Azuma's inequality. 

Example 1: Consider two probability measures Pi and P2 
where 

Pi(0) = P 2 (l) - 0.4, = P 2 (0) = 0.6, 



and the case of a single threshold of the log-likelihood ratio 
that is set to zero (i.e., A = 0). The exact error exponent in 
this case is Chernoff information that is equal to 

C(P U P 2 ) = 2.04- 10 -2 . 

The improved lower bound on the error exponent in d46l ) and 
d47b is equal to 1.77- 10~ 2 , whereas the loosened lower bound 
in ( f54b is equal to 1.39- 10~ 2 . In this case 71 = I and 72 = |, 
so the improvement in the lower bound on the error exponent 
is indeed by a factor of approximately (max^ 7^) 1 = | . Note 
that, from (|33K (l4lT i and d48l>— (TSTb. these are lower bounds on 
the error exponents for any finite block length n, and not only 
asymptotically in the limit where n —> 00. The operational 
meaning of this example is that the improved lower bound on 
the error exponent assures that a fixed error probability can 
be obtained based on a sequence of i.i.d. RVs whose length is 
reduced by 22.2% as compared to the loosened bound which 
follows from Azuma's inequality. 

D. Comparison of the Exact and Lower Bounds on the Error 
Exponents, Followed by a Relation to Fisher Information 

In the following, we compare the exact and lower bounds 
on the error exponents. Consider the case where there is a 
single threshold on the log-likelihood ratio (i.e., referring to 
the case where the erasure option is not provided) that is set 
to zero. The exact error exponent in this case is given by the 
Chernoff information (see d23l), and it will be compared to 
the two lower bounds on the error exponents that were derived 
in the previous two subsections. 

Let {Pg}g € Q, denote an indexed family of probability mass 
functions where 9 denotes the parameter set. Assume that 
Pg is differentiable in the parameter 8. Then, the Fisher 
information is defined as 



J(0) 4 E e 



d_ 



lnP e (x) 



(58) 



where the expectation is w.r.t. the probability mass function 
Pe. The divergence and Fisher information are two related 
information measures, satisfying the equality 

D(Pg\\Pg,) _ J(9) 



lim 



g/\2 



(59) 



(note that if it was a relative entropy to base 2 then the right- 
hand side of d59l would have been divided by In 2, and be 
equal to ^ as in Eq. (12.364)]). 

Proposition 2: Under the above assumptions, 
• The Chernoff information and Fisher information are 
related information measures that satisfy the equality 

C(P e ,P e >) J(B) 



lim 



- 6') 2 



Let 



E h (P e ,Pe 



= min D 

i=l,2 



Si + 7< 



1+7* 



1 + 7; 



(60) 



(61) 



be the lower bound on the error exponent in d46b which 
corresponds to Pi = Pe and P2 = Pe', then also 

E L (P e ,Pe>) J (9) 



lim 



?A2 



8 



(62) 
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• Let 



£ L (P e ,P e 0^minf 

1—1.2 Z 



(63) 



be the loosened lower bound on the error exponent in 
(EU which refers to P 1 = P g and P 2 = P e ,. Then, 



lim 



E h (P e ,P e >) a(6)J(6) 



(64) 



for some deterministic function a bounded in [0,1], 
and there exists an indexed family of probability mass 
functions for which a(9) can be made arbitrarily close to 
zero for any fixed value of 9 £ O. 

Proof: See Appendix 151 ■ 

Proposition [2] shows that, in the considered setting, the 
refined lower bound on the error exponent provides the correct 
behavior of the error exponent for a binary hypothesis testing 
when the relative entropy between the pair of probability 
mass functions that characterize the two hypotheses tends to 
zero. This stays in contrast to the loosened error exponent, 
which follows from Azuma's inequality, whose scaling may 
differ significantly from the correct exponent (for a concrete 
example, see the last part of the proof in Appendix iBt. 

Example 2: Consider the index family of of probability 
mass functions defined over the binary alphabet X = {0, 1}: 

P 9 (0) = 1-9, P e (l) = 9, V9e (0,1). 

From d58l ). the Fisher information is equal to 

J ^ = l + T--9 

and, at the point 9 = 0.5, J{9) = 4. Let 9 X = 0.51 and 
6» 2 = 0.49, so from © and ^ 



C(P ei ,Pe 2 ),E h (P gi ,P e[ 



J(0)(0i 



2.00-10" 



Indeed, the exact values of C(Pg 1 , Ps 2 ) an d E\ j (Pg 1 ,Pg 2 ) are 
2.000 • 10~ 4 and 1.997 • 10~ 4 , respectively. 

IV. Summary 

This work introduces a concentration inequality for discrete- 
parameter martingales with uniformly bounded jumps, which 
forms a refined version of Azuma's inequality. The tightness of 
this concentration inequality is studied via a large deviations 
analysis of binary hypothesis testing, and the demonstration 
of its improved tightness over Azuma's inequality is revisited 
in this context. Some links of the derived lower bounds on 
the error exponents to some information measures (e.g., the 
relative entropy and Fisher information) are obtained along 
the way. This paper presents in part the work in |[T2l where 
further concentration inequalities that form a refinement of 
Azuma's inequality were derived, followed by some further 
applications of these concentration inequalities in information 
theory, communication, and coding theory. It is meant to 
stimulate the use of some refined versions of the Azuma- 
Hoeffding inequality in information-theoretic aspects. 



Appendix A 

Some complementary remarks concerning the 
construction of doob's martingales 

This appendix is relevant to the analysis in Section [III] 

Remark 1: Let {Xi, Fi] be a martingale sequence. 

For every i, E[X i+ i] = E[E[X i+ i|Ji]] = E[Xi], so the 
expectation of a martingale stays constant. 

Remark 2: One can generate martingale sequences by the 
following procedure: Given a RV X e L 1 (J7,J r , P) and an 
arbitrary filtration of sub er-algebras {Pi], let 



X % = E[X| Ji] 



0,1,. 



Then, the sequence Xo, X\, . . . forms a martingale since 

1) The RV Xi = K[X\J-i] is ^-measurable, and also 
E[|Xj|] < E[|X|] < oo (since conditioning reduces the 
expectation of the absolute value). 

2) By construction {Pi} is a filtration. 

3) From the tower principle for conditional expectations, 
since {J 7 ,} is a filtration, then for every i 

E[X i+1 \Fi\ = E[E[*|.F i+ i]|jV] = E[X\Fi] a.s. 

Remark 3: In continuation to Remark [2] one can choose 
J~o = {Q, 0} and T n = J- . Hence, Xo,Xi,...,X n is a 
martingale sequence where 

X = E[X| J" ] = E[X] (since X is independent of Jo) 
X n = EpflJ^] = X a.s. (since X is .F-measurable). 

This has the following interpretation: At the beginning, we 
don't know anything about X, so it is initially estimated by 
its expectation. We then reveal at each step more and more 
information about X until we can specify it completely (a.s.). 



Appendix B 
Proof of Proposition^] 

The proof of (|60T > is based on calculus, and it is similar to 
the proof of the limit in ( 1591 that relates the divergence and 
Fisher information. For the proof of d62l . note that 

S? Sf 



C{Pe,Pe') > E L (Pg,P g ,) > min 

i— 1 ,2 



(65) 



.2 7l 6 7l 2 (l + 70. 
The left-hand side of d65l ) holds since E^ is a lower bound on 
the error exponent, and the exact value of this error exponent is 
the Chernoff information. The right-hand side of d65l ) follows 
from Lemma Q] (see (l57l i) and the definition of El in (loTl i. By 

2 

definition 7i = % and Si = j- where, based on d47l ), 

ei = D{P B \\P e ,), e 2 = D{P' e \\P e ). (66) 
The term on the left-hand side of (f6Sb therefore satisfies 

S? 8? 



_ e: 
so it follows from 



2 7j 6 7 ?(l+ 7i ) 

2 ^.3 



(8) ~ 2a? 



1 



C(P e ,Pg>) > E L (P e ,Pe>) > min , , 

i=l,2 | ZO~ 



and the last inequality that 



£jA 
~3~ 



1 - 



(67) 



7 



Based on the continuity assumption of the indexed family 
{Pe}ee&> m en it follows from d66| i that 

lime^O, Vie {1,2} 

and also, from (f28b and d38l with Pi and P2 replaced by Pg 
and Pg respectively, then 

limd 4 = 0, Vie {1,2}. 
It therefore follows from d60b and d67b that 



J(9) E L (P 9 ,Pg,) 

— - — > lim — 7- tttttt- > hm mm 



9') 2 



e'^e i=i,2 1 2a 



(68) 



The idea is to show that the limit on the right-hand side of 
this inequality is — |^ (same as the left-hand side), and hence, 
the limit of the middle term is also 



lim 



JJ6) 



t\2 



2af{9 - 9 
(J lim D(Pe\\Pe>) 2 



IY2 



9>->e 20-2(6* — 6") 
(b) J(0) D(P e \\P e >) 



*0 <7f 



® J{6) lim 



D(P e \\P e . 



4 ^E^Po^^m-DiPeWP, ^ 



(d ) J(9) 



4 8' 



(e) J{9f 



lim 



P e ,(x) 

D(P e \\Pe>) 



Y.** x Pe(x) (in 



lim 

8 e'^e 

(f) J(^) 2 r 
= lim 

8 

(g) J(0) 



m) - D ( p «WPe. 

(9-9') 2 



ExeAf P e (x)(ln^) - P(P e ||P 9 
(6* - 6>') 2 



E xeX Poto (in 



fv 0*0 



(69) 



where equality (a) follows from ( I66I 1. equalities (b), (e) and (f) 
follow from d59l . equality (c) follows from d29l with Pi = p? 
and P2 = Pes equality (d) follows from the definition of the 
divergence, and equality (g) follows by calculus (the required 
limit is calculated by using L'Hopital's rule twice) and from 
the definition of Fisher information in ( f58l . Similarly, also 



so 



-2 



J(9) 



lim min 



J(6) 



Xe i=i,2 \2a 2 (9~9') 2 

Hence, it follows from §$b that lime^e ^^2° = 
This completes the proof of j62l . 

We prove now Eq. §4$. From @, d47} and dg3} 

P L (P e ,P e O=min^ 



with £1 and £2 in 



Hence, 



,. E L (Pg,P ei ) 

una — tt: „,„ < lim 



2d\{6'-9f 

and from d69l and last inequality then it follows that 

lim^M 

(O'-O) 2 



< 



d 2 



(a) J(9) ,. SxGA-- 

= — ^ — hm 



>(*)( 



WIIJV) 



In 



P 9 , (s) 



-D(Pfl||Ps/ 



(70) 



It is clear that the second term on the right-hand side of 
(f70l is bounded between zero and one (if the limit exists). 
This limit can be made arbitrarily small, i.e., there exists an 
indexed family of probability mass functions {Pe}e^Q f° r 
which the second term on the right-hand side of (f70l can be 
made arbitrarily close to zero. For a concrete example, let 
a e (0, 1) be fixed, and 9 e K + be a parameter that defines 
the following indexed family of probability mass functions 
over the ternary alphabet X — {0,1,2}: 

0(1 - a) „ _ „ 1 



Pe(0) 



Pe{l) 



Pe(2) 



1 + 9 ' v ' ' v 7 l + 6> 
Then, it follows by calculus that for this indexed family 



lim 



\^na,x xeX 



P e , (x) 



D{P e \\P e . 



(l-a)6 



so, for any 9 e K + , the above limit can be made arbitrarily 
close to zero by choosing a close enough to 1 . This completes 
the proof of d64b . and also the proof of Proposition [2] 
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